Flow Matching

Flow Matching is a simulation-free approach to train continuous normalizing flows by directly regressing on a vector field that generates a desired probability path. Unlike [[Diffusion Model|diffusion models]] that rely on [[Stochastic Differential Equation (SDE)|SDE]] theory, Flow Matching provides a simpler and more flexible framework for generative modeling using ordinary differential equations.


1. Core Concept

1.1 Motivation

Problems with existing methods:

  1. [[Diffusion Model|Diffusion Models]]:

    • Require complex [[Stochastic Differential Equation (SDE)|SDE]]/ODE theory
    • Need to solve Fokker-Planck equation
    • Constrained by specific noise schedules
  2. [[Continuous Normalizing Flow]]:

    • Likelihood computation is expensive
    • Training requires simulating ODE trajectories
    • Limited architectural choices
  3. Score-Based Models:

    • Require score matching objectives
    • Complex mathematical derivation

Flow Matching solves these by:

  • Direct vector field regression (no simulation needed)
  • Flexible conditional flow design
  • Simpler mathematical foundation
  • Connections to optimal transport

1.2 Key Idea

Instead of deriving the ODE from a stochastic process, Flow Matching directly learns a velocity field that transports samples from a simple distribution (e.g., Gaussian) to the data distribution.

dxdt=vθ(x,t)

where vθ(x,t) is a neural network parameterized velocity field.

[!NOTE] Core Insight
Flow Matching bypasses the need for [[Stochastic Differential Equation (SDE)|SDE]] theory and score matching by directly learning the velocity field that generates the desired probability flow, making it conceptually simpler and more flexible.


2. Mathematical Foundation

2.1 Continuous Normalizing Flows

A continuous normalizing flow (CNF) is defined by an ODE:

dxdt=v(x,t),x(0)p0,x(1)p1

where:

  • p0 : Simple prior distribution (e.g., N(0,I) )
  • p1 : Target data distribution
  • v(x,t) : Time-dependent velocity field

Probability path pt(x) evolves according to the continuity equation:

pt(x)t=x[v(x,t)pt(x)]

2.2 Flow Matching Objective

Goal: Learn vθ(x,t) such that the generated probability path pt(x) matches the desired path.

Flow Matching Loss:

LFM(θ)=EtU[0,1],xpt(x)[vθ(x,t)ut(x)2]

where ut(x) is the target vector field that generates the desired probability path.

2.3 The Challenge

The problem: pt(x) and ut(x) are intractable - we cannot sample from pt(x) or evaluate ut(x) directly.

Solution: Use Conditional Flow Matching (CFM).


3. Conditional Flow Matching (CFM)

3.1 Key Insight

Instead of matching the marginal vector field ut(x) , we match conditional vector fields ut(xz) given some conditioning variable z .

Conditional Flow Matching Loss:

LCFM(θ)=Et,xpt(xz),zp(z)[vθ(x,t)ut(xz)2]

Theorem: Under certain conditions, θLFM=θLCFM , meaning minimizing CFM also minimizes FM.

3.2 Conditional Probability Path

Given data point z=x1p1 , we define a conditional probability path:

pt(xx1)=N(xμt(x1),σt2(x1)I)

where:

  • μt(x1) : Time-dependent mean
  • σt(x1) : Time-dependent standard deviation

Boundary conditions:

  • t=0 : p0(xx1)=N(x0,I) (prior)
  • t=1 : p1(xx1)=δ(xx1) (data point)

3.3 Conditional Vector Field

For the Gaussian conditional path, the conditional vector field is:

ut(xx1)=dμt/dtσt(xμt)+dσtdtxμtσt

Simplified form:

ut(xx1)=σtσt(xμt)+μt

where μt=dμtdt and σt=dσtdt .


4. Common Flow Designs

4.1 Optimal Transport Flow

Idea: Transport points along straight lines from noise to data.

Conditional path:

xt=(1t)x0+tx1,x0N(0,I),x1p1

Conditional vector field:

ut(xx1)=x1x0=x1xt1t

Advantages:

  • Straight trajectories (easy to integrate)
  • Minimal transport cost (optimal transport)
  • Fast sampling (few ODE steps needed)

4.2 Gaussian Conditional Flow

General form:

μt(x1)=tx1,σt(x1)=1(1σmin)t

Conditional vector field:

ut(xx1)=x1(1σmin)x1(1σmin)t

where σmin is a small constant (e.g., 0.001 ) for numerical stability.

4.3 Variance Exploding Flow

Similar to VE-[[Stochastic Differential Equation (SDE)|SDE]] in diffusion models:

σt=σmin(σmaxσmin)t

Conditional vector field:

ut(xx1)=σtσt(xtx1)+x1

4.4 Comparison of Flow Designs

Flow Type Trajectory Transport Cost Sampling Speed Stability
OT Flow Straight Minimal Very Fast Good
Gaussian Curved Moderate Fast Very Good
VE Flow Curved Higher Medium Good
VP Flow Curved Moderate Fast Very Good

5. Training Algorithm

5.1 Flow Matching Training

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Conditional Flow Matching Training
def flow_matching_loss(model, x1, t):
"""
model: Neural network v_theta(x, t)
x1: Data samples from p_1
t: Time steps sampled from U[0, 1]
"""
# Sample noise from prior
x0 = torch.randn_like(x1) # x0 ~ N(0, I)

# Construct conditional path
mu_t = t * x1
sigma_t = 1 - (1 - sigma_min) * t

# Sample x_t from conditional distribution
eps = torch.randn_like(x1)
xt = mu_t + sigma_t * eps

# Compute target vector field (OT flow)
ut = (x1 - x0) # or: ut = (x1 - xt) / (1 - t)

# Predict velocity field
v_theta = model(xt, t)

# MSE loss
loss = F.mse_loss(v_theta, ut)

return loss

5.2 Complete Training Loop

1
2
3
4
5
6
7
8
9
10
11
12
for epoch in range(num_epochs):
for x1 in dataloader:
# Sample time
t = torch.rand(x1.shape[0], device=x1.device)

# Compute loss
loss = flow_matching_loss(model, x1, t)

# Update
optimizer.zero_grad()
loss.backward()
optimizer.step()

5.3 Key Differences from Diffusion Models

Aspect Diffusion Models Flow Matching
Objective Score matching / ELBO Vector field regression
Mathematical foundation [[Stochastic Differential Equation (SDE)|SDE]] theory ODE theory
Target Score function logpt(x) Velocity field v(x,t)
Training Denoising objective Direct regression
Flexibility Constrained by [[Stochastic Differential Equation (SDE)|SDE]] Arbitrary flow design
Likelihood Tractable via ODE Tractable via ODE

6. Sampling Algorithm

6.1 ODE Integration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Flow Matching Sampling
def sample(model, num_steps=50):
"""
model: Trained velocity field network
num_steps: Number of ODE solver steps
"""
# Sample from prior
x_0 = torch.randn(batch_size, dim)

# Define ODE
def ode_func(t, x):
return model(x, t)

# Solve ODE from t=0 to t=1
t_span = [0, 1]
t_eval = torch.linspace(0, 1, num_steps)

solution = solve_ivp(ode_func, t_span, x_0, t_eval=t_eval, method='RK45')

# Return final state
x_1 = solution.y[:, -1]

return x_1

6.2 Euler Method (Simple)

1
2
3
4
5
6
7
8
9
10
def euler_sample(model, x_0, num_steps=100):
x_t = x_0
dt = 1.0 / num_steps

for i in range(num_steps):
t = i * dt
v = model(x_t, t)
x_t = x_t + v * dt

return x_t

6.3 Advanced ODE Solvers

Solver Order Steps Needed Characteristics
Euler 1st 100-200 Simple, slow
RK4 4th 50-100 Accurate, moderate
DOPRI5 Adaptive 20-50 Automatic step size
[[DPM-Solver]] Specialized 10-20 Fast for diffusion

7. Theoretical Analysis

7.1 Equivalence to Score Matching

Theorem: Under certain conditions, Flow Matching is equivalent to score matching.

For the conditional flow:

ut(xx1)=σtσtxlogpt(xx1)+μt

This shows the connection between velocity fields and score functions.

7.2 Likelihood Computation

Using the instantaneous change of variables formula:

ddtlogpt(x(t))=xv(x(t),t)

Integrating from t=0 to t=1 :

logp1(x1)=logp0(x0)01xv(x(t),t)dt

Divergence computation:

  • Exact: O(d2) for d -dimensional data
  • Hutchinson’s estimator: O(d) (stochastic)

7.3 Optimal Transport Connection

Benamou-Brenier Formula:

The Wasserstein-2 distance between p0 and p1 can be expressed as:

W22(p0,p1)=infv01v(x,t)2pt(x)dxdt

subject to the continuity equation.

OT Flow minimizes this transport cost, leading to straight trajectories.

7.4 Rectified Flows

Key Idea: Iteratively straighten the flow trajectories.

Algorithm:

  1. Train initial Flow Matching model
  2. Generate samples (x0,x1) pairs
  3. Retrain model with straight-line interpolation: xt=(1t)x0+tx1
  4. Repeat 2-3 times

Result: Nearly straight trajectories, enabling 1-step generation.


8. Advanced Variants

8.1 Rectified Flow

Motivation: Straight trajectories are easier to integrate.

Method:

  • Learn residual velocity: vθ(x,t)=x1x0+residual
  • Iteratively “rectify” the flow
  • Achieve 1-2 step generation with high quality

Loss:

L=E[vθ(xt,t)(x1x0)2]

8.2 Flow Matching with Prior Blending

Idea: Use learned prior instead of fixed Gaussian.

p0(x)=VAE latent distribution

Benefits:

  • Lower transport cost
  • Faster convergence
  • Better sample quality

8.3 Multimodal Flow Matching

Challenge: Standard flows are deterministic mappings (bijective).

Solution: Use mixture of flows or stochastic interpolation.

xtkwkN(xμt(k),σt(k)I)

8.4 Comparison Table

Variant Trajectory Steps Quality Training Cost
Standard FM Curved 50-100 High Low
Rectified Flow Straight 1-10 Very High Medium
Prior Blending Curved 30-50 Very High Medium
Multimodal FM Curved 50-100 High High

9. Applications

9.1 Text-to-Image Generation

Stable Diffusion + Flow Matching:

  • Replace diffusion ODE with Flow Matching
  • Faster training (no score matching)
  • Flexible flow design
  • Comparable or better FID scores

Example: SD3 (Stable Diffusion 3) uses rectified flows.

9.2 Molecular Generation

Advantages:

  • Continuous representation of molecules
  • Exact likelihood computation
  • Flexible prior design
  • Fast sampling

9.3 Audio Synthesis

Benefits:

  • High-fidelity audio generation
  • Faster than diffusion models
  • Controllable generation via conditioning

9.4 Video Generation

Temporal Flow Matching:

  • Model spatiotemporal dynamics
  • Straight trajectories reduce artifacts
  • Efficient sampling for long sequences

9.5 3D Generation

Point Cloud / Mesh Generation:

  • Continuous 3D structure modeling
  • Optimal transport preserves geometry
  • Fast generation for interactive applications

10. Practical Implementation

10.1 Network Architecture

Common choices:

  1. U-Net (from diffusion models):

    • Proven architecture
    • Multi-scale processing
    • Attention mechanisms
  2. Transformer:

    • Global receptive field
    • Scalable to high dimensions
    • Good for sequential data
  3. MLP (for low-dimensional data):

    • Simple and efficient
    • Good for toy examples

Time embedding:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class SinusoidalTimeEmbedding(nn.Module):
def __init__(self, dim):
super().__init__()
self.dim = dim

def forward(self, t):
# Sinusoidal embedding
device = t.device
half_dim = self.dim // 2
embeddings = math.log(10000) / (half_dim - 1)
embeddings = torch.exp(torch.arange(half_dim, device=device) * -embeddings)
embeddings = t[:, None] * embeddings[None, :]
embeddings = torch.cat((embeddings.sin(), embeddings.cos()), dim=-1)
return embeddings

10.2 Training Best Practices

1. Time sampling:

  • Uniform: tU[0,1]
  • Importance sampling: More weight on difficult regions

2. Data normalization:

  • Normalize data to [1,1] or N(0,1)
  • Ensure numerical stability

3. Learning rate scheduling:

  • Warmup: Gradually increase LR
  • Cosine decay: Smooth decrease

4. Batch size:

  • Larger batches = more stable gradients
  • Typical: 64-256

10.3 Debugging Checklist

  • [ ] Verify boundary conditions: p0=N(0,I) , p1 data
  • [ ] Check trajectory continuity (no jumps)
  • [ ] Monitor loss convergence
  • [ ] Test ODE solver with different step sizes
  • [ ] Validate likelihood computation
  • [ ] Compare sample quality with baseline

11. Comparison with Other Methods

11.1 Flow Matching vs [[Diffusion Model|Diffusion Models]]

Aspect Flow Matching Diffusion Models
Foundation ODE theory [[Stochastic Differential Equation (SDE)|SDE]] theory
Objective Vector field regression Score matching / ELBO
Flexibility High (arbitrary flows) Constrained (noise schedule)
Training Simple regression Complex derivation
Sampling ODE integration [[Stochastic Differential Equation (SDE)|SDE]]/ODE integration
Likelihood Exact Exact (via ODE)
Theory Simpler More complex

11.2 Flow Matching vs [[Continuous Normalizing Flow]]

Aspect Flow Matching Traditional CNF
Training Simulation-free Requires ODE simulation
Speed Fast Slow (backprop through ODE)
Architecture Flexible Constrained (trace computation)
Scalability High Limited

11.3 Flow Matching vs GAN

Aspect Flow Matching GAN
Training stability Stable (MSE loss) Unstable (minimax game)
Mode coverage Complete Mode collapse possible
Likelihood Exact Intractable
Sample quality High High
Sampling speed Medium (ODE steps) Fast (1 step)

11.4 Generative Model Comparison

Model Training Sampling Likelihood Stability Quality
GAN Adversarial 1 step Intractable Unstable High
VAE ELBO 1 step Lower bound Stable Medium
Normalizing Flow Likelihood Parallel Exact Stable Medium-High
Diffusion Score matching 50-1000 steps Exact Stable Very High
Flow Matching Vector regression 10-100 steps Exact Stable Very High

12. Core Formula Cards

[!QUOTE] Flow Matching Objective

LFM(θ)=Et,xpt(x)[vθ(x,t)ut(x)2]

[!QUOTE] Conditional Flow Matching

LCFM(θ)=Et,xpt(xz),z[vθ(x,t)ut(xz)2]

[!QUOTE] Optimal Transport Flow

xt=(1t)x0+tx1,ut(xx1)=x1x0

[!QUOTE] Gaussian Conditional Flow

μt(x1)=tx1,σt=1(1σmin)t ut(xx1)=x1(1σmin)x1(1σmin)t

[!QUOTE] Continuity Equation

pt(x)t=x[v(x,t)pt(x)]

[!QUOTE] Likelihood Computation

logp1(x1)=logp0(x0)01xv(x(t),t)dt

13. Recent Advances (2023-2024)

13.1 Rectified Flow

Key Paper: Building Normalizing Flows with Stochastic Interpolants (Albergo et al., 2023)

Contributions:

  • Iterative straightening of trajectories
  • 1-2 step generation with high quality
  • Connections to optimal transport

13.2 Flow Matching for Large-Scale Generation

SD3 (Stable Diffusion 3):

  • Uses rectified flows instead of diffusion
  • Better sample quality
  • Faster training and sampling
  • Multimodal conditioning

13.3 Flow Matching + Consistency Models

Idea: Combine Flow Matching with consistency models for 1-step generation.

Method:

  1. Train Flow Matching model
  2. Distill to consistency model
  3. Achieve 1-step generation

13.4 Multimodal Flow Matching

Challenge: Standard flows are deterministic.

Solutions:

  • Mixture of flows
  • Stochastic interpolation
  • Latent variable models

  • [[Diffusion Model]]
  • [[Continuous Normalizing Flow]]
  • [[Probability Flow ODE]]
  • [[Stochastic Differential Equation (SDE)]]
  • [[Fokker-Planck Equation]]
  • [[Optimal Transport]]
  • [[Rectified Flows]]
  • [[Score Function]]
  • [[Neural ODE]]
  • [[DPM-Solver]]
  • [[DDIM]]
  • [[Wiener Process|Wiener Process]]
  • [[Markov Process]]
  • [[U-Net]]
  • [[Generative Adversarial Network (GAN)]]

Dataview Query

1
2
3
LIST
FROM #flow_matching OR #continuous_normalizing_flow OR #generative_model
SORT file.ctime DESC

References

  • Paper: Flow Matching for Generative Modeling (Lipman et al., 2023)
  • Paper: Building Normalizing Flows with Stochastic Interpolants (Albergo et al., 2023)
  • Paper: Rectified Flow (Liu et al., 2022)
  • Paper: Action Matching: Learning Stochastic Dynamics From Samples (Neklyudov et al., 2022)
  • Paper: SE(3)-Stochastic Flow Matching for Protein Backbone Generation (Bose et al., 2023)
  • Blog: Flow Matching: A New Paradigm for Generative Modeling - Lilian Weng
  • Course: CS236 Deep Generative Models (Stanford)
  • GitHub: https://github.com/atong01/conditional-flow-matching